Estimating Quality in User-Guided Multi-Objective Bandits Optimization
نویسندگان
چکیده
Many real-world applications are characterized by a number of conflicting performance measures. As optimizing in a multi-objective setting leads to a set of non-dominated solutions, a preference function is required for selecting the solution with the appropriate trade-off between the objectives. This preference function is often unknown, especially when it comes from an expert human user. However, if we could provide the expert user with a proper estimation for each action, she would be able to pick her best choice. The question is: how good do these estimations have to be in order for her choice to remain the same as if she had access to the exact values? In this paper, we introduce the concept of preference radius to characterize the robustness of the preference function and provide guidelines for controlling the quality of estimations in the multi-objective setting. More specifically, we provide a general formulation of multi-objective optimization under the bandits setting and the pure exploration setting with user feedback for articulating the preferences. We show how the preference radius relates to the optimal gap and how it can be used to analyze algorithms in the bandits and pure exploration settings. We finally present experiments in the bandits setting, where we evaluate the impact of noise and delayed expert user feedback, and in the pure exploration setting, where we compare multi-objective Thompson sampling with uniform sampling.
منابع مشابه
Different Network Performance Measures in a Multi-Objective Traffic Assignment Problem
Traffic assignment algorithms are used to determine possible use of paths between origin-destination pairs and predict traffic flow in network links. One of the main deficiencies of ordinary traffic assignment methods is that in most of them one measure (mostly travel time) is usually included in objective function and other effective performance measures in traffic assignment are not considere...
متن کاملInteractive Thompson Sampling for Multi-objective Multi-armed Bandits
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solution sets for unknown utility functions of users, based on the stochastic reward vectors only. In online MORL on the other hand, the agent will often be able to elicit preferences from the user, enabling it to learn about the utility function of its user directly. In this paper, we study online MO...
متن کاملPareto Local Search for Alternative Clustering
Supervised alternative clusterings is the problem of finding a set of clusterings which are of high quality and different from a given negative clustering. The task is therefore a clear multi-objective optimization problem. Optimizing two conflicting objectives at the same time requires dealing with tradeoffs. Most approaches in the literature optimize these objectives sequentially (one objecti...
متن کاملMulti-Objective X -Armed Bandits
Many of the standard optimization algorithms focus on optimizing a single, scalar feedback signal. However, real-life optimization problems often require a simultaneous optimization of more than one objective. In this paper, we propose a multi-objective extension to the standard X -armed bandit problem. As the feedback signal is now vector-valued, the goal of the agent is to sample actions in t...
متن کاملBandwidth and Delay Optimization by Integrating of Software Trust Estimator with Multi-User Cloud Resource Competence
Trust Establishment is one of the significant resources to enhance the scalability and reliability of resources in the cloud environment. To establish a novel trust model on SaaS (Software as a Service) cloud resources and to optimize the resource utilization of multiple user requests, an integrated software trust estimator with multi-user resource competence (IST-MRC) optimization mechanism is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1701.01095 شماره
صفحات -
تاریخ انتشار 2017